Overview

Dataset statistics

Number of variables33
Number of observations29986
Missing cells0
Missing cells (%)0.0%
Duplicate rows35
Duplicate rows (%)0.1%
Total size in memory7.8 MiB
Average record size in memory272.0 B

Variable types

Numeric28
Categorical5

Warnings

Dataset has 35 (0.1%) duplicate rows Duplicates
SEX is highly correlated with SE_MA and 1 other fieldsHigh correlation
AGE is highly correlated with AgeBinHigh correlation
BILL_AMT1 is highly correlated with BILL_AMT2High correlation
BILL_AMT2 is highly correlated with BILL_AMT1 and 1 other fieldsHigh correlation
BILL_AMT3 is highly correlated with BILL_AMT2 and 1 other fieldsHigh correlation
BILL_AMT4 is highly correlated with BILL_AMT3 and 2 other fieldsHigh correlation
BILL_AMT5 is highly correlated with BILL_AMT4 and 1 other fieldsHigh correlation
BILL_AMT6 is highly correlated with BILL_AMT4 and 1 other fieldsHigh correlation
SE_MA is highly correlated with SEXHigh correlation
AgeBin is highly correlated with AGEHigh correlation
SE_AG is highly correlated with SEXHigh correlation
Closeness_6 is highly correlated with Closeness_5High correlation
Closeness_5 is highly correlated with Closeness_6 and 1 other fieldsHigh correlation
Closeness_4 is highly correlated with Closeness_5High correlation
Closeness_3 is highly correlated with Closeness_2High correlation
Closeness_2 is highly correlated with Closeness_3 and 1 other fieldsHigh correlation
Closeness_1 is highly correlated with Closeness_2High correlation
PAY_AMT2 is highly skewed (γ1 = 30.46741983) Skewed
PAY_0 has 14733 (49.1%) zeros Zeros
PAY_2 has 15726 (52.4%) zeros Zeros
PAY_3 has 15762 (52.6%) zeros Zeros
PAY_4 has 16452 (54.9%) zeros Zeros
PAY_5 has 16944 (56.5%) zeros Zeros
PAY_6 has 16285 (54.3%) zeros Zeros
BILL_AMT1 has 2004 (6.7%) zeros Zeros
BILL_AMT2 has 2504 (8.4%) zeros Zeros
BILL_AMT3 has 2869 (9.6%) zeros Zeros
BILL_AMT4 has 3194 (10.7%) zeros Zeros
BILL_AMT5 has 3502 (11.7%) zeros Zeros
BILL_AMT6 has 4017 (13.4%) zeros Zeros
PAY_AMT1 has 5247 (17.5%) zeros Zeros
PAY_AMT2 has 5394 (18.0%) zeros Zeros
PAY_AMT3 has 5966 (19.9%) zeros Zeros
PAY_AMT4 has 6405 (21.4%) zeros Zeros
PAY_AMT5 has 6700 (22.3%) zeros Zeros
PAY_AMT6 has 7168 (23.9%) zeros Zeros

Reproduction

Analysis started2021-03-05 11:09:25.972318
Analysis finished2021-03-05 11:11:24.286302
Duration1 minute and 58.31 seconds
Software versionpandas-profiling v2.12.0
Download configurationconfig.yaml

Variables

LIMIT_BAL
Real number (ℝ≥0)

Distinct81
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean167461.1379
Minimum10000
Maximum1000000
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size468.5 KiB

Quantile statistics

Minimum10000
5-th percentile20000
Q150000
median140000
Q3240000
95-th percentile430000
Maximum1000000
Range990000
Interquartile range (IQR)190000

Descriptive statistics

Standard deviation129760.9827
Coefficient of variation (CV)0.7748722145
Kurtosis0.5368061035
Mean167461.1379
Median Absolute Deviation (MAD)90000
Skewness0.9933272738
Sum5021489680
Variance1.683791264 × 1010
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
500003364
 
11.2%
200001976
 
6.6%
300001610
 
5.4%
800001567
 
5.2%
2000001526
 
5.1%
1500001109
 
3.7%
1000001047
 
3.5%
180000995
 
3.3%
360000880
 
2.9%
60000825
 
2.8%
Other values (71)15087
50.3%
ValueCountFrequency (%)
10000493
 
1.6%
160002
 
< 0.1%
200001976
6.6%
300001610
5.4%
40000230
 
0.8%
ValueCountFrequency (%)
10000001
 
< 0.1%
8000002
< 0.1%
7800002
< 0.1%
7600001
 
< 0.1%
7500004
< 0.1%

SEX
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size468.5 KiB
2
18106 
1
11880 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters29986
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2
2nd row2
3rd row2
4th row2
5th row1
ValueCountFrequency (%)
218106
60.4%
111880
39.6%
Histogram of lengths of the category
ValueCountFrequency (%)
218106
60.4%
111880
39.6%

Most occurring characters

ValueCountFrequency (%)
218106
60.4%
111880
39.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number29986
100.0%

Most frequent character per category

ValueCountFrequency (%)
218106
60.4%
111880
39.6%

Most occurring scripts

ValueCountFrequency (%)
Common29986
100.0%

Most frequent character per script

ValueCountFrequency (%)
218106
60.4%
111880
39.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII29986
100.0%

Most frequent character per block

ValueCountFrequency (%)
218106
60.4%
111880
39.6%

EDUCATION
Categorical

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size468.5 KiB
2
14030 
1
10585 
3
4917 
5
 
331
4
 
123

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters29986
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2
2nd row2
3rd row2
4th row2
5th row2
ValueCountFrequency (%)
214030
46.8%
110585
35.3%
34917
 
16.4%
5331
 
1.1%
4123
 
0.4%
Histogram of lengths of the category
ValueCountFrequency (%)
214030
46.8%
110585
35.3%
34917
 
16.4%
5331
 
1.1%
4123
 
0.4%

Most occurring characters

ValueCountFrequency (%)
214030
46.8%
110585
35.3%
34917
 
16.4%
5331
 
1.1%
4123
 
0.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number29986
100.0%

Most frequent character per category

ValueCountFrequency (%)
214030
46.8%
110585
35.3%
34917
 
16.4%
5331
 
1.1%
4123
 
0.4%

Most occurring scripts

ValueCountFrequency (%)
Common29986
100.0%

Most frequent character per script

ValueCountFrequency (%)
214030
46.8%
110585
35.3%
34917
 
16.4%
5331
 
1.1%
4123
 
0.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII29986
100.0%

Most frequent character per block

ValueCountFrequency (%)
214030
46.8%
110585
35.3%
34917
 
16.4%
5331
 
1.1%
4123
 
0.4%

MARRIAGE
Categorical

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size468.5 KiB
2
15954 
1
13655 
3
 
377

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters29986
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row2
3rd row2
4th row1
5th row1
ValueCountFrequency (%)
215954
53.2%
113655
45.5%
3377
 
1.3%
Histogram of lengths of the category
ValueCountFrequency (%)
215954
53.2%
113655
45.5%
3377
 
1.3%

Most occurring characters

ValueCountFrequency (%)
215954
53.2%
113655
45.5%
3377
 
1.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number29986
100.0%

Most frequent character per category

ValueCountFrequency (%)
215954
53.2%
113655
45.5%
3377
 
1.3%

Most occurring scripts

ValueCountFrequency (%)
Common29986
100.0%

Most frequent character per script

ValueCountFrequency (%)
215954
53.2%
113655
45.5%
3377
 
1.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII29986
100.0%

Most frequent character per block

ValueCountFrequency (%)
215954
53.2%
113655
45.5%
3377
 
1.3%

AGE
Real number (ℝ≥0)

HIGH CORRELATION

Distinct56
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean35.48392583
Minimum21
Maximum79
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size468.5 KiB

Quantile statistics

Minimum21
5-th percentile23
Q128
median34
Q341
95-th percentile53
Maximum79
Range58
Interquartile range (IQR)13

Descriptive statistics

Standard deviation9.218534723
Coefficient of variation (CV)0.2597946678
Kurtosis0.04472938413
Mean35.48392583
Median Absolute Deviation (MAD)6
Skewness0.7325757746
Sum1064021
Variance84.98138243
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
291605
 
5.4%
271477
 
4.9%
281408
 
4.7%
301393
 
4.6%
261256
 
4.2%
311217
 
4.1%
251186
 
4.0%
341162
 
3.9%
321158
 
3.9%
331146
 
3.8%
Other values (46)16978
56.6%
ValueCountFrequency (%)
2167
 
0.2%
22560
1.9%
23931
3.1%
241127
3.8%
251186
4.0%
ValueCountFrequency (%)
791
 
< 0.1%
753
< 0.1%
741
 
< 0.1%
734
< 0.1%
723
< 0.1%

PAY_0
Real number (ℝ)

ZEROS

Distinct11
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-0.0164743547
Minimum-2
Maximum8
Zeros14733
Zeros (%)49.1%
Negative8438
Negative (%)28.1%
Memory size468.5 KiB

Quantile statistics

Minimum-2
5-th percentile-2
Q1-1
median0
Q30
95-th percentile2
Maximum8
Range10
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.123785345
Coefficient of variation (CV)-68.21422544
Kurtosis2.722011751
Mean-0.0164743547
Median Absolute Deviation (MAD)1
Skewness0.7323112099
Sum-494
Variance1.262893502
MonotocityNot monotonic
Histogram with fixed size bins (bins=11)
ValueCountFrequency (%)
014733
49.1%
-15682
 
18.9%
13685
 
12.3%
-22756
 
9.2%
22667
 
8.9%
3322
 
1.1%
476
 
0.3%
526
 
0.1%
819
 
0.1%
611
 
< 0.1%
ValueCountFrequency (%)
-22756
 
9.2%
-15682
 
18.9%
014733
49.1%
13685
 
12.3%
22667
 
8.9%
ValueCountFrequency (%)
819
 
0.1%
79
 
< 0.1%
611
 
< 0.1%
526
 
0.1%
476
0.3%

PAY_2
Real number (ℝ)

ZEROS

Distinct11
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-0.1333622357
Minimum-2
Maximum8
Zeros15726
Zeros (%)52.4%
Negative9822
Negative (%)32.8%
Memory size468.5 KiB

Quantile statistics

Minimum-2
5-th percentile-2
Q1-1
median0
Q30
95-th percentile2
Maximum8
Range10
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.19720764
Coefficient of variation (CV)-8.977111352
Kurtosis1.570309426
Mean-0.1333622357
Median Absolute Deviation (MAD)0
Skewness0.7904586596
Sum-3999
Variance1.433306134
MonotocityNot monotonic
Histogram with fixed size bins (bins=11)
ValueCountFrequency (%)
015726
52.4%
-16044
 
20.2%
23927
 
13.1%
-23778
 
12.6%
3326
 
1.1%
499
 
0.3%
128
 
0.1%
525
 
0.1%
720
 
0.1%
612
 
< 0.1%
ValueCountFrequency (%)
-23778
 
12.6%
-16044
 
20.2%
015726
52.4%
128
 
0.1%
23927
 
13.1%
ValueCountFrequency (%)
81
 
< 0.1%
720
 
0.1%
612
 
< 0.1%
525
 
0.1%
499
0.3%

PAY_3
Real number (ℝ)

ZEROS

Distinct11
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-0.1658440606
Minimum-2
Maximum8
Zeros15762
Zeros (%)52.6%
Negative10012
Negative (%)33.4%
Memory size468.5 KiB

Quantile statistics

Minimum-2
5-th percentile-2
Q1-1
median0
Q30
95-th percentile2
Maximum8
Range10
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.19682557
Coefficient of variation (CV)-7.216571797
Kurtosis2.085375266
Mean-0.1658440606
Median Absolute Deviation (MAD)0
Skewness0.8406315527
Sum-4973
Variance1.432391445
MonotocityNot monotonic
Histogram with fixed size bins (bins=11)
ValueCountFrequency (%)
015762
52.6%
-15931
 
19.8%
-24081
 
13.6%
23818
 
12.7%
3240
 
0.8%
476
 
0.3%
727
 
0.1%
623
 
0.1%
521
 
0.1%
14
 
< 0.1%
ValueCountFrequency (%)
-24081
 
13.6%
-15931
 
19.8%
015762
52.6%
14
 
< 0.1%
23818
 
12.7%
ValueCountFrequency (%)
83
 
< 0.1%
727
 
0.1%
623
 
0.1%
521
 
0.1%
476
0.3%

PAY_4
Real number (ℝ)

ZEROS

Distinct11
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-0.2203695058
Minimum-2
Maximum8
Zeros16452
Zeros (%)54.9%
Negative10025
Negative (%)33.4%
Memory size468.5 KiB

Quantile statistics

Minimum-2
5-th percentile-2
Q1-1
median0
Q30
95-th percentile2
Maximum8
Range10
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.169106503
Coefficient of variation (CV)-5.305209987
Kurtosis3.498525213
Mean-0.2203695058
Median Absolute Deviation (MAD)0
Skewness0.9997163765
Sum-6608
Variance1.366810015
MonotocityNot monotonic
Histogram with fixed size bins (bins=11)
ValueCountFrequency (%)
016452
54.9%
-15681
 
18.9%
-24344
 
14.5%
23158
 
10.5%
3180
 
0.6%
469
 
0.2%
758
 
0.2%
535
 
0.1%
65
 
< 0.1%
82
 
< 0.1%
ValueCountFrequency (%)
-24344
 
14.5%
-15681
 
18.9%
016452
54.9%
12
 
< 0.1%
23158
 
10.5%
ValueCountFrequency (%)
82
 
< 0.1%
758
0.2%
65
 
< 0.1%
535
0.1%
469
0.2%

PAY_5
Real number (ℝ)

ZEROS

Distinct10
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-0.2658240512
Minimum-2
Maximum8
Zeros16944
Zeros (%)56.5%
Negative10074
Negative (%)33.6%
Memory size468.5 KiB

Quantile statistics

Minimum-2
5-th percentile-2
Q1-1
median0
Q30
95-th percentile2
Maximum8
Range10
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.133216354
Coefficient of variation (CV)-4.26303169
Kurtosis3.990186627
Mean-0.2658240512
Median Absolute Deviation (MAD)0
Skewness1.008135047
Sum-7971
Variance1.284179306
MonotocityNot monotonic
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
016944
56.5%
-15532
 
18.4%
-24542
 
15.1%
22626
 
8.8%
3178
 
0.6%
484
 
0.3%
758
 
0.2%
517
 
0.1%
64
 
< 0.1%
81
 
< 0.1%
ValueCountFrequency (%)
-24542
 
15.1%
-15532
 
18.4%
016944
56.5%
22626
 
8.8%
3178
 
0.6%
ValueCountFrequency (%)
81
 
< 0.1%
758
0.2%
64
 
< 0.1%
517
 
0.1%
484
0.3%

PAY_6
Real number (ℝ)

ZEROS

Distinct10
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-0.2906022811
Minimum-2
Maximum8
Zeros16285
Zeros (%)54.3%
Negative10622
Negative (%)35.4%
Memory size468.5 KiB

Quantile statistics

Minimum-2
5-th percentile-2
Q1-1
median0
Q30
95-th percentile2
Maximum8
Range10
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.149949673
Coefficient of variation (CV)-3.957125417
Kurtosis3.427731675
Mean-0.2906022811
Median Absolute Deviation (MAD)0
Skewness0.9479783161
Sum-8714
Variance1.32238425
MonotocityNot monotonic
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
016285
54.3%
-15733
 
19.1%
-24889
 
16.3%
22766
 
9.2%
3184
 
0.6%
449
 
0.2%
746
 
0.2%
619
 
0.1%
513
 
< 0.1%
82
 
< 0.1%
ValueCountFrequency (%)
-24889
 
16.3%
-15733
 
19.1%
016285
54.3%
22766
 
9.2%
3184
 
0.6%
ValueCountFrequency (%)
82
 
< 0.1%
746
0.2%
619
 
0.1%
513
 
< 0.1%
449
0.2%

BILL_AMT1
Real number (ℝ)

HIGH CORRELATION
ZEROS

Distinct22715
Distinct (%)75.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean51241.74595
Minimum-165580
Maximum964511
Zeros2004
Zeros (%)6.7%
Negative590
Negative (%)2.0%
Memory size468.5 KiB

Quantile statistics

Minimum-165580
5-th percentile0
Q13564.25
median22393.5
Q367141.25
95-th percentile201217.75
Maximum964511
Range1130091
Interquartile range (IQR)63577

Descriptive statistics

Standard deviation73647.45693
Coefficient of variation (CV)1.437255027
Kurtosis9.801475321
Mean51241.74595
Median Absolute Deviation (MAD)21810.5
Skewness2.663192136
Sum1536534994
Variance5423947912
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
02004
 
6.7%
390244
 
0.8%
78076
 
0.3%
32672
 
0.2%
31663
 
0.2%
250059
 
0.2%
39649
 
0.2%
240039
 
0.1%
41629
 
0.1%
50025
 
0.1%
Other values (22705)27326
91.1%
ValueCountFrequency (%)
-1655801
< 0.1%
-1549731
< 0.1%
-153081
< 0.1%
-143861
< 0.1%
-115451
< 0.1%
ValueCountFrequency (%)
9645111
< 0.1%
7468141
< 0.1%
6530621
< 0.1%
6304581
< 0.1%
6266481
< 0.1%

BILL_AMT2
Real number (ℝ)

HIGH CORRELATION
ZEROS

Distinct22339
Distinct (%)74.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean49197.34079
Minimum-69777
Maximum983931
Zeros2504
Zeros (%)8.4%
Negative669
Negative (%)2.2%
Memory size468.5 KiB

Quantile statistics

Minimum-69777
5-th percentile0
Q12986
median21216
Q364027.75
95-th percentile194795
Maximum983931
Range1053708
Interquartile range (IQR)61041.75

Descriptive statistics

Standard deviation71184.82114
Coefficient of variation (CV)1.446924163
Kurtosis10.29805535
Mean49197.34079
Median Absolute Deviation (MAD)20826
Skewness2.704551617
Sum1475231461
Variance5067278760
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
02504
 
8.4%
390231
 
0.8%
32675
 
0.3%
78075
 
0.3%
31672
 
0.2%
250051
 
0.2%
39651
 
0.2%
240042
 
0.1%
-20029
 
0.1%
41628
 
0.1%
Other values (22329)26828
89.5%
ValueCountFrequency (%)
-697771
< 0.1%
-675261
< 0.1%
-333501
< 0.1%
-300001
< 0.1%
-262141
< 0.1%
ValueCountFrequency (%)
9839311
< 0.1%
7439701
< 0.1%
6715631
< 0.1%
6467701
< 0.1%
6244751
< 0.1%

BILL_AMT3
Real number (ℝ)

HIGH CORRELATION
ZEROS

Distinct22015
Distinct (%)73.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean47026.92403
Minimum-157264
Maximum1664089
Zeros2869
Zeros (%)9.6%
Negative655
Negative (%)2.2%
Memory size468.5 KiB

Quantile statistics

Minimum-157264
5-th percentile0
Q12667.25
median20091.5
Q360174.5
95-th percentile187835.75
Maximum1664089
Range1821353
Interquartile range (IQR)57507.25

Descriptive statistics

Standard deviation69360.88417
Coefficient of variation (CV)1.474918583
Kurtosis19.77628324
Mean47026.92403
Median Absolute Deviation (MAD)19711.5
Skewness3.087219116
Sum1410149344
Variance4810932253
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
02869
 
9.6%
390275
 
0.9%
78074
 
0.2%
32663
 
0.2%
31662
 
0.2%
39648
 
0.2%
250040
 
0.1%
240039
 
0.1%
41629
 
0.1%
20026
 
0.1%
Other values (22005)26461
88.2%
ValueCountFrequency (%)
-1572641
< 0.1%
-615061
< 0.1%
-461271
< 0.1%
-340411
< 0.1%
-254431
< 0.1%
ValueCountFrequency (%)
16640891
< 0.1%
8550861
< 0.1%
6931311
< 0.1%
6896431
< 0.1%
6896271
< 0.1%

BILL_AMT4
Real number (ℝ)

HIGH CORRELATION
ZEROS

Distinct21540
Distinct (%)71.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean43276.91476
Minimum-170000
Maximum891586
Zeros3194
Zeros (%)10.7%
Negative675
Negative (%)2.3%
Memory size468.5 KiB

Quantile statistics

Minimum-170000
5-th percentile0
Q12329.25
median19056
Q354559.5
95-th percentile174380.25
Maximum891586
Range1061586
Interquartile range (IQR)52230.25

Descriptive statistics

Standard deviation64343.80078
Coefficient of variation (CV)1.486792696
Kurtosis11.303771
Mean43276.91476
Median Absolute Deviation (MAD)18660
Skewness2.8212682
Sum1297701566
Variance4140124699
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
03194
 
10.7%
390246
 
0.8%
780101
 
0.3%
31668
 
0.2%
32662
 
0.2%
39644
 
0.1%
240039
 
0.1%
15039
 
0.1%
250034
 
0.1%
41633
 
0.1%
Other values (21530)26126
87.1%
ValueCountFrequency (%)
-1700001
< 0.1%
-813341
< 0.1%
-651671
< 0.1%
-506161
< 0.1%
-466271
< 0.1%
ValueCountFrequency (%)
8915861
< 0.1%
7068641
< 0.1%
6286991
< 0.1%
6168361
< 0.1%
5728051
< 0.1%

BILL_AMT5
Real number (ℝ)

HIGH CORRELATION
ZEROS

Distinct21005
Distinct (%)70.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean40326.76256
Minimum-81334
Maximum927171
Zeros3502
Zeros (%)11.7%
Negative655
Negative (%)2.2%
Memory size468.5 KiB

Quantile statistics

Minimum-81334
5-th percentile0
Q11765.75
median18118.5
Q350220.75
95-th percentile165798.5
Maximum927171
Range1008505
Interquartile range (IQR)48455

Descriptive statistics

Standard deviation60806.54835
Coefficient of variation (CV)1.507846018
Kurtosis12.30059999
Mean40326.76256
Median Absolute Deviation (MAD)17702.5
Skewness2.875731101
Sum1209238302
Variance3697436322
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
03502
 
11.7%
390235
 
0.8%
78094
 
0.3%
31679
 
0.3%
32662
 
0.2%
15058
 
0.2%
39647
 
0.2%
240039
 
0.1%
250037
 
0.1%
41636
 
0.1%
Other values (20995)25797
86.0%
ValueCountFrequency (%)
-813341
< 0.1%
-613721
< 0.1%
-530071
< 0.1%
-466271
< 0.1%
-375941
< 0.1%
ValueCountFrequency (%)
9271711
< 0.1%
8235401
< 0.1%
5870671
< 0.1%
5517021
< 0.1%
5478801
< 0.1%

BILL_AMT6
Real number (ℝ)

HIGH CORRELATION
ZEROS

Distinct20597
Distinct (%)68.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean38887.44718
Minimum-339603
Maximum961664
Zeros4017
Zeros (%)13.4%
Negative688
Negative (%)2.3%
Memory size468.5 KiB

Quantile statistics

Minimum-339603
5-th percentile0
Q11257
median17097.5
Q349227.75
95-th percentile161912
Maximum961664
Range1301267
Interquartile range (IQR)47970.75

Descriptive statistics

Standard deviation59563.15405
Coefficient of variation (CV)1.531680745
Kurtosis12.26549564
Mean38887.44718
Median Absolute Deviation (MAD)16781.5
Skewness2.845986675
Sum1166078991
Variance3547769321
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
04017
 
13.4%
390207
 
0.7%
78086
 
0.3%
15078
 
0.3%
31677
 
0.3%
32656
 
0.2%
39645
 
0.2%
41636
 
0.1%
-1833
 
0.1%
240032
 
0.1%
Other values (20587)25319
84.4%
ValueCountFrequency (%)
-3396031
< 0.1%
-2090511
< 0.1%
-1509531
< 0.1%
-946251
< 0.1%
-738951
< 0.1%
ValueCountFrequency (%)
9616641
< 0.1%
6999441
< 0.1%
5686381
< 0.1%
5277111
< 0.1%
5275661
< 0.1%

PAY_AMT1
Real number (ℝ≥0)

ZEROS

Distinct7939
Distinct (%)26.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5663.448743
Minimum0
Maximum873552
Zeros5247
Zeros (%)17.5%
Negative0
Negative (%)0.0%
Memory size468.5 KiB

Quantile statistics

Minimum0
5-th percentile0
Q11000
median2100
Q35006
95-th percentile18425.25
Maximum873552
Range873552
Interquartile range (IQR)4006

Descriptive statistics

Standard deviation16566.50987
Coefficient of variation (CV)2.92516285
Kurtosis415.1242799
Mean5663.448743
Median Absolute Deviation (MAD)1931.5
Skewness14.66660918
Sum169824174
Variance274449249.2
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
05247
 
17.5%
20001363
 
4.5%
3000891
 
3.0%
5000698
 
2.3%
1500507
 
1.7%
4000426
 
1.4%
10000401
 
1.3%
1000365
 
1.2%
2500298
 
1.0%
6000294
 
1.0%
Other values (7929)19496
65.0%
ValueCountFrequency (%)
05247
17.5%
19
 
< 0.1%
214
 
< 0.1%
315
 
0.1%
418
 
0.1%
ValueCountFrequency (%)
8735521
< 0.1%
5050001
< 0.1%
4933581
< 0.1%
4239031
< 0.1%
4050161
< 0.1%

PAY_AMT2
Real number (ℝ≥0)

SKEWED
ZEROS

Distinct7892
Distinct (%)26.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5917.844061
Minimum0
Maximum1684259
Zeros5394
Zeros (%)18.0%
Negative0
Negative (%)0.0%
Memory size468.5 KiB

Quantile statistics

Minimum0
5-th percentile0
Q1833
median2009
Q35000
95-th percentile19001.75
Maximum1684259
Range1684259
Interquartile range (IQR)4167

Descriptive statistics

Standard deviation23040.83551
Coefficient of variation (CV)3.893450939
Kurtosis1642.424329
Mean5917.844061
Median Absolute Deviation (MAD)1991
Skewness30.46741983
Sum177452472
Variance530880101.1
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
05394
 
18.0%
20001290
 
4.3%
3000857
 
2.9%
5000717
 
2.4%
1000594
 
2.0%
1500521
 
1.7%
4000410
 
1.4%
10000318
 
1.1%
6000283
 
0.9%
2500251
 
0.8%
Other values (7882)19351
64.5%
ValueCountFrequency (%)
05394
18.0%
115
 
0.1%
220
 
0.1%
318
 
0.1%
411
 
< 0.1%
ValueCountFrequency (%)
16842591
< 0.1%
12270821
< 0.1%
12154711
< 0.1%
10245161
< 0.1%
5804641
< 0.1%

PAY_AMT3
Real number (ℝ≥0)

ZEROS

Distinct7512
Distinct (%)25.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5224.000967
Minimum0
Maximum896040
Zeros5966
Zeros (%)19.9%
Negative0
Negative (%)0.0%
Memory size468.5 KiB

Quantile statistics

Minimum0
5-th percentile0
Q1390
median1800
Q34503
95-th percentile17534.75
Maximum896040
Range896040
Interquartile range (IQR)4113

Descriptive statistics

Standard deviation17609.29387
Coefficient of variation (CV)3.370844298
Kurtosis564.2817568
Mean5224.000967
Median Absolute Deviation (MAD)1795
Skewness17.21788413
Sum156646893
Variance310087230.7
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
05966
 
19.9%
20001285
 
4.3%
10001102
 
3.7%
3000870
 
2.9%
5000721
 
2.4%
1500490
 
1.6%
4000381
 
1.3%
10000312
 
1.0%
1200243
 
0.8%
6000241
 
0.8%
Other values (7502)18375
61.3%
ValueCountFrequency (%)
05966
19.9%
113
 
< 0.1%
219
 
0.1%
314
 
< 0.1%
415
 
0.1%
ValueCountFrequency (%)
8960401
< 0.1%
8890431
< 0.1%
5082291
< 0.1%
4175881
< 0.1%
4009721
< 0.1%

PAY_AMT4
Real number (ℝ≥0)

ZEROS

Distinct6933
Distinct (%)23.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4826.639699
Minimum0
Maximum621000
Zeros6405
Zeros (%)21.4%
Negative0
Negative (%)0.0%
Memory size468.5 KiB

Quantile statistics

Minimum0
5-th percentile0
Q1296
median1500
Q34013.75
95-th percentile16011.75
Maximum621000
Range621000
Interquartile range (IQR)3717.75

Descriptive statistics

Standard deviation15669.21268
Coefficient of variation (CV)3.246401981
Kurtosis277.2442185
Mean4826.639699
Median Absolute Deviation (MAD)1500
Skewness12.90329242
Sum144731618
Variance245524226
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
06405
 
21.4%
10001394
 
4.6%
20001214
 
4.0%
3000887
 
3.0%
5000810
 
2.7%
1500441
 
1.5%
4000402
 
1.3%
10000341
 
1.1%
500258
 
0.9%
2500258
 
0.9%
Other values (6923)17576
58.6%
ValueCountFrequency (%)
06405
21.4%
122
 
0.1%
222
 
0.1%
313
 
< 0.1%
420
 
0.1%
ValueCountFrequency (%)
6210001
< 0.1%
5288971
< 0.1%
4970001
< 0.1%
4321301
< 0.1%
4000461
< 0.1%

PAY_AMT5
Real number (ℝ≥0)

ZEROS

Distinct6892
Distinct (%)23.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4800.441706
Minimum0
Maximum426529
Zeros6700
Zeros (%)22.3%
Negative0
Negative (%)0.0%
Memory size468.5 KiB

Quantile statistics

Minimum0
5-th percentile0
Q1251.5
median1500
Q34031
95-th percentile16000
Maximum426529
Range426529
Interquartile range (IQR)3779.5

Descriptive statistics

Standard deviation15281.70818
Coefficient of variation (CV)3.183396261
Kurtosis179.9832984
Mean4800.441706
Median Absolute Deviation (MAD)1500
Skewness11.12497653
Sum143946045
Variance233530604.9
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
06700
 
22.3%
10001340
 
4.5%
20001323
 
4.4%
3000947
 
3.2%
5000814
 
2.7%
1500426
 
1.4%
4000401
 
1.3%
10000343
 
1.1%
500250
 
0.8%
6000247
 
0.8%
Other values (6882)17195
57.3%
ValueCountFrequency (%)
06700
22.3%
121
 
0.1%
213
 
< 0.1%
313
 
< 0.1%
412
 
< 0.1%
ValueCountFrequency (%)
4265291
< 0.1%
4179901
< 0.1%
3880711
< 0.1%
3792671
< 0.1%
3320001
< 0.1%

PAY_AMT6
Real number (ℝ≥0)

ZEROS

Distinct6936
Distinct (%)23.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5216.533582
Minimum0
Maximum528666
Zeros7168
Zeros (%)23.9%
Negative0
Negative (%)0.0%
Memory size468.5 KiB

Quantile statistics

Minimum0
5-th percentile0
Q1118
median1500
Q34000
95-th percentile17369
Maximum528666
Range528666
Interquartile range (IQR)3882

Descriptive statistics

Standard deviation17781.32692
Coefficient of variation (CV)3.408648029
Kurtosis167.0906005
Mean5216.533582
Median Absolute Deviation (MAD)1500
Skewness10.63858739
Sum156422976
Variance316175586.9
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
07168
23.9%
10001299
 
4.3%
20001295
 
4.3%
3000914
 
3.0%
5000808
 
2.7%
1500439
 
1.5%
4000411
 
1.4%
10000356
 
1.2%
500247
 
0.8%
6000220
 
0.7%
Other values (6926)16829
56.1%
ValueCountFrequency (%)
07168
23.9%
120
 
0.1%
29
 
< 0.1%
314
 
< 0.1%
412
 
< 0.1%
ValueCountFrequency (%)
5286661
< 0.1%
5271431
< 0.1%
4430011
< 0.1%
4220001
< 0.1%
4035001
< 0.1%

Default
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size468.5 KiB
0
23350 
1
6636 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters29986
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row0
4th row0
5th row0
ValueCountFrequency (%)
023350
77.9%
16636
 
22.1%
Histogram of lengths of the category
ValueCountFrequency (%)
023350
77.9%
16636
 
22.1%

Most occurring characters

ValueCountFrequency (%)
023350
77.9%
16636
 
22.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number29986
100.0%

Most frequent character per category

ValueCountFrequency (%)
023350
77.9%
16636
 
22.1%

Most occurring scripts

ValueCountFrequency (%)
Common29986
100.0%

Most frequent character per script

ValueCountFrequency (%)
023350
77.9%
16636
 
22.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII29986
100.0%

Most frequent character per block

ValueCountFrequency (%)
023350
77.9%
16636
 
22.1%

SE_MA
Real number (ℝ≥0)

HIGH CORRELATION

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.368638698
Minimum1
Maximum6
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size468.5 KiB

Quantile statistics

Minimum1
5-th percentile1
Q12
median4
Q35
95-th percentile5
Maximum6
Range5
Interquartile range (IQR)3

Descriptive statistics

Standard deviation1.543187062
Coefficient of variation (CV)0.4581040593
Kurtosis-1.439433511
Mean3.368638698
Median Absolute Deviation (MAD)1
Skewness-0.3497360266
Sum101012
Variance2.381426308
MonotocityNot monotonic
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
59407
31.4%
48467
28.2%
26547
21.8%
15188
17.3%
6232
 
0.8%
3145
 
0.5%
ValueCountFrequency (%)
15188
17.3%
26547
21.8%
3145
 
0.5%
48467
28.2%
59407
31.4%
ValueCountFrequency (%)
6232
 
0.8%
59407
31.4%
48467
28.2%
3145
 
0.5%
26547
21.8%

AgeBin
Categorical

HIGH CORRELATION

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size468.5 KiB
2
11231 
1
9617 
3
6459 
4
2340 
5
 
339

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters29986
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row2
4th row2
5th row4
ValueCountFrequency (%)
211231
37.5%
19617
32.1%
36459
21.5%
42340
 
7.8%
5339
 
1.1%
Histogram of lengths of the category
ValueCountFrequency (%)
211231
37.5%
19617
32.1%
36459
21.5%
42340
 
7.8%
5339
 
1.1%

Most occurring characters

ValueCountFrequency (%)
211231
37.5%
19617
32.1%
36459
21.5%
42340
 
7.8%
5339
 
1.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number29986
100.0%

Most frequent character per category

ValueCountFrequency (%)
211231
37.5%
19617
32.1%
36459
21.5%
42340
 
7.8%
5339
 
1.1%

Most occurring scripts

ValueCountFrequency (%)
Common29986
100.0%

Most frequent character per script

ValueCountFrequency (%)
211231
37.5%
19617
32.1%
36459
21.5%
42340
 
7.8%
5339
 
1.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII29986
100.0%

Most frequent character per block

ValueCountFrequency (%)
211231
37.5%
19617
32.1%
36459
21.5%
42340
 
7.8%
5339
 
1.1%

SE_AG
Real number (ℝ≥0)

HIGH CORRELATION

Distinct10
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.103748416
Minimum1
Maximum10
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size468.5 KiB

Quantile statistics

Minimum1
5-th percentile1
Q12
median6
Q37
95-th percentile8
Maximum10
Range9
Interquartile range (IQR)5

Descriptive statistics

Standard deviation2.553899468
Coefficient of variation (CV)0.5003968181
Kurtosis-1.323168512
Mean5.103748416
Median Absolute Deviation (MAD)2
Skewness-0.3163160542
Sum153041
Variance6.522402491
MonotocityNot monotonic
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
76670
22.2%
66337
21.1%
24561
15.2%
83691
12.3%
13280
10.9%
32768
9.2%
91248
 
4.2%
41092
 
3.6%
5179
 
0.6%
10160
 
0.5%
ValueCountFrequency (%)
13280
10.9%
24561
15.2%
32768
9.2%
41092
 
3.6%
5179
 
0.6%
ValueCountFrequency (%)
10160
 
0.5%
91248
 
4.2%
83691
12.3%
76670
22.2%
66337
21.1%

Closeness_6
Real number (ℝ)

HIGH CORRELATION

Distinct23644
Distinct (%)78.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.6812992165
Minimum-2.88555
Maximum2.50953
Zeros28
Zeros (%)0.1%
Negative798
Negative (%)2.7%
Memory size468.5 KiB

Quantile statistics

Minimum-2.88555
5-th percentile0.026124375
Q10.4177436905
median0.81459
Q30.99219875
95-th percentile1
Maximum2.50953
Range5.39508
Interquartile range (IQR)0.5744550595

Descriptive statistics

Standard deviation0.3453089827
Coefficient of variation (CV)0.5068389546
Kurtosis0.1707189498
Mean0.6812992165
Median Absolute Deviation (MAD)0.18541
Skewness-0.8546532032
Sum20429.43831
Variance0.1192382935
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
14017
 
13.4%
0.992242
 
0.1%
0.99731
 
0.1%
0.96131
 
0.1%
028
 
0.1%
0.99512526
 
0.1%
0.9980519
 
0.1%
0.9916
 
0.1%
0.980516
 
0.1%
0.98715
 
0.1%
Other values (23634)25745
85.9%
ValueCountFrequency (%)
-2.885551
< 0.1%
-1.69411
< 0.1%
-1.4420666671
< 0.1%
-1.416051
< 0.1%
-1.194051
< 0.1%
ValueCountFrequency (%)
2.509531
< 0.1%
2.2128678571
< 0.1%
1.751
< 0.1%
1.7208655171
< 0.1%
1.58051
< 0.1%

Closeness_5
Real number (ℝ)

HIGH CORRELATION

Distinct24065
Distinct (%)80.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.6667713841
Minimum-3.9355
Maximum1.876742857
Zeros20
Zeros (%)0.1%
Negative820
Negative (%)2.7%
Memory size468.5 KiB

Quantile statistics

Minimum-3.9355
5-th percentile0.0257
Q10.39750625
median0.7877099379
Q30.9888571429
95-th percentile1
Maximum1.876742857
Range5.812242857
Interquartile range (IQR)0.5913508929

Descriptive statistics

Standard deviation0.3505512357
Coefficient of variation (CV)0.5257442717
Kurtosis1.807782789
Mean0.6667713841
Median Absolute Deviation (MAD)0.2117170318
Skewness-0.9281182575
Sum19993.80672
Variance0.1228861689
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
13502
 
11.7%
0.992239
 
0.1%
0.980536
 
0.1%
0.96134
 
0.1%
0.99512530
 
0.1%
0.99723
 
0.1%
020
 
0.1%
0.98720
 
0.1%
0.996119
 
0.1%
0.97416
 
0.1%
Other values (24055)26247
87.5%
ValueCountFrequency (%)
-3.93551
< 0.1%
-3.926251
< 0.1%
-2.4987166671
< 0.1%
-1.732081
< 0.1%
-1.53351
< 0.1%
ValueCountFrequency (%)
1.8767428571
< 0.1%
1.76531
< 0.1%
1.751
< 0.1%
1.609621
< 0.1%
1.472251
< 0.1%

Closeness_4
Real number (ℝ)

HIGH CORRELATION

Distinct24440
Distinct (%)81.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.640378923
Minimum-4.14685
Maximum2.3745
Zeros14
Zeros (%)< 0.1%
Negative1018
Negative (%)3.4%
Memory size468.5 KiB

Quantile statistics

Minimum-4.14685
5-th percentile0.0137075
Q10.3319189951
median0.757585
Q30.985672
95-th percentile1
Maximum2.3745
Range6.52135
Interquartile range (IQR)0.6537530049

Descriptive statistics

Standard deviation0.3686953778
Coefficient of variation (CV)0.5757456477
Kurtosis1.3571981
Mean0.640378923
Median Absolute Deviation (MAD)0.2407091176
Skewness-0.8346489936
Sum19202.40239
Variance0.1359362816
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
13194
 
10.7%
0.96147
 
0.2%
0.992245
 
0.2%
0.980536
 
0.1%
0.99512527
 
0.1%
0.98720
 
0.1%
0.99720
 
0.1%
0.9980518
 
0.1%
0.996116
 
0.1%
0.987515
 
0.1%
Other values (24430)26548
88.5%
ValueCountFrequency (%)
-4.146851
< 0.1%
-3.64551
< 0.1%
-2.715951
< 0.1%
-2.1700333331
< 0.1%
-1.798241
< 0.1%
ValueCountFrequency (%)
2.37451
< 0.1%
2.04331
< 0.1%
1.87581
< 0.1%
1.751
< 0.1%
1.6538461541
< 0.1%

Closeness_3
Real number (ℝ)

HIGH CORRELATION

Distinct24726
Distinct (%)82.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.6076811577
Minimum-9.688575
Maximum2.0251
Zeros5
Zeros (%)< 0.1%
Negative1583
Negative (%)5.3%
Memory size468.5 KiB

Quantile statistics

Minimum-9.688575
5-th percentile-0.00171
Q10.2445238871
median0.7264225
Q30.98395325
95-th percentile1
Maximum2.0251
Range11.713675
Interquartile range (IQR)0.7394293629

Descriptive statistics

Standard deviation0.3964647048
Coefficient of variation (CV)0.6524222444
Kurtosis16.25047825
Mean0.6076811577
Median Absolute Deviation (MAD)0.2716325
Skewness-1.326840115
Sum18221.9272
Variance0.1571842621
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
12869
 
9.6%
0.992241
 
0.1%
0.980538
 
0.1%
0.96131
 
0.1%
0.99512530
 
0.1%
0.98730
 
0.1%
0.99721
 
0.1%
0.996120
 
0.1%
0.987519
 
0.1%
0.9980516
 
0.1%
Other values (24716)26871
89.6%
ValueCountFrequency (%)
-9.6885751
< 0.1%
-4.39141
< 0.1%
-3.558051
< 0.1%
-2.86321
< 0.1%
-2.476051
< 0.1%
ValueCountFrequency (%)
2.02511
< 0.1%
1.9250823531
< 0.1%
1.751
< 0.1%
1.18721
< 0.1%
1.169621
< 0.1%

Closeness_2
Real number (ℝ)

HIGH CORRELATION

Distinct25076
Distinct (%)83.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.5887250858
Minimum-5.3805
Maximum2.39554
Zeros2
Zeros (%)< 0.1%
Negative1940
Negative (%)6.5%
Memory size468.5 KiB

Quantile statistics

Minimum-5.3805
5-th percentile-0.01036809955
Q10.1933329545
median0.7035604545
Q30.981665
95-th percentile1
Maximum2.39554
Range7.77604
Interquartile range (IQR)0.7883320455

Descriptive statistics

Standard deviation0.4045605353
Coefficient of variation (CV)0.6871807319
Kurtosis2.673035351
Mean0.5887250858
Median Absolute Deviation (MAD)0.2937837677
Skewness-0.8175524053
Sum17653.51042
Variance0.1636692267
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
12504
 
8.4%
0.992235
 
0.1%
0.98729
 
0.1%
0.980529
 
0.1%
0.99512528
 
0.1%
0.96124
 
0.1%
0.99719
 
0.1%
0.996115
 
0.1%
0.993055555614
 
< 0.1%
0.987514
 
< 0.1%
Other values (25066)27275
91.0%
ValueCountFrequency (%)
-5.38051
< 0.1%
-4.45621
< 0.1%
-3.26761
< 0.1%
-2.76311
< 0.1%
-2.75661
< 0.1%
ValueCountFrequency (%)
2.395541
< 0.1%
1.9646571431
< 0.1%
1.29891
< 0.1%
1.2717333331
< 0.1%
1.2544285711
< 0.1%

Closeness_1
Real number (ℝ)

HIGH CORRELATION

Distinct25557
Distinct (%)85.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.576078443
Minimum-5.4553
Maximum1.619892
Zeros8
Zeros (%)< 0.1%
Negative2115
Negative (%)7.1%
Memory size468.5 KiB

Quantile statistics

Minimum-5.4553
5-th percentile-0.01313026316
Q10.1700104167
median0.685640724
Q30.9779548913
95-th percentile1
Maximum1.619892
Range7.075192
Interquartile range (IQR)0.8079444746

Descriptive statistics

Standard deviation0.4114713624
Coefficient of variation (CV)0.7142627317
Kurtosis2.618953272
Mean0.576078443
Median Absolute Deviation (MAD)0.3096397522
Skewness-0.8186595073
Sum17274.28819
Variance0.1693086821
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
12004
 
6.7%
0.992237
 
0.1%
0.980536
 
0.1%
0.99512526
 
0.1%
0.996123
 
0.1%
0.98723
 
0.1%
0.987521
 
0.1%
0.99719
 
0.1%
0.96117
 
0.1%
0.993055555616
 
0.1%
Other values (25547)27764
92.6%
ValueCountFrequency (%)
-5.45531
< 0.1%
-4.30951
< 0.1%
-3.14061
< 0.1%
-2.95291
< 0.1%
-2.72551
< 0.1%
ValueCountFrequency (%)
1.6198921
< 0.1%
1.5519333331
< 0.1%
1.23091
< 0.1%
1.21
< 0.1%
1.196041
< 0.1%

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

LIMIT_BALSEXEDUCATIONMARRIAGEAGEPAY_0PAY_2PAY_3PAY_4PAY_5PAY_6BILL_AMT1BILL_AMT2BILL_AMT3BILL_AMT4BILL_AMT5BILL_AMT6PAY_AMT1PAY_AMT2PAY_AMT3PAY_AMT4PAY_AMT5PAY_AMT6DefaultSE_MAAgeBinSE_AGCloseness_6Closeness_5Closeness_4Closeness_3Closeness_2Closeness_1
0200002212422-1-1-2-2391331026890000689000014161.001.001.000.970.840.80
112000022226-12000226821725268232723455326101000100010000200015160.970.970.970.980.990.98
2900002223400000029239140271355914331149481554915181500100010001000500005270.830.830.840.850.840.68
3500002213700000046990482334929128314289592954720002019120011001069100004270.410.420.430.010.040.06
45000012157-10-1000861756703583520940191461913120003668110000900068967901440.620.620.580.280.890.83
55000011237000000644005706957608193941961920024250018156571000100080002220.600.610.61-0.15-0.14-0.29
65000001122900000036796541202344500754265348300347394455000400003800020239137501377002110.050.03-0.090.110.180.26
7100000222230-1-100-111876380601221-15956738060105811687154205160.991.001.000.991.000.88
814000023128002000112851409612108122111179337193329043210001000100004160.970.920.910.910.900.92
92000013235-2-2-2-2-1-100001300713912000130071122002220.300.351.001.001.001.00

Last rows

LIMIT_BALSEXEDUCATIONMARRIAGEAGEPAY_0PAY_2PAY_3PAY_4PAY_5PAY_6BILL_AMT1BILL_AMT2BILL_AMT3BILL_AMT4BILL_AMT5BILL_AMT6PAY_AMT1PAY_AMT2PAY_AMT3PAY_AMT4PAY_AMT5PAY_AMT6DefaultSE_MAAgeBinSE_AGCloseness_6Closeness_5Closeness_4Closeness_3Closeness_2Closeness_1
2999014000012141000000138325137142139110138262496754612160007000422815052000200001330.670.650.016.36e-030.020.01
299912100001213432222225002500250025002500250000000011220.990.990.999.88e-010.990.99
299921000013143000-2-2-2880210400000020000000001331.001.001.001.00e+00-0.040.12
29993100000112380-1-1000304214271029967062669473550042000111784400030002000200002220.450.310.29-3.00e-020.990.97
2999480000122342222227255777708793847751982607811587000350007000040001222-0.01-0.030.037.70e-030.030.09
2999522000013139000000188948192815208365880043123715980850020000500330475000100001220.930.860.605.29e-020.120.14
2999615000013243-1-1-1-1001683182835028979519001837352689981290002331.000.970.949.77e-010.990.99
299973000012237432-100356533562758208782058219357002200042002000310012220.350.310.309.08e-010.890.88
2999880000131411-1000-1-164578379763045277411855489448590034091178192652964180411330.390.850.344.62e-020.021.02
29999500001214600000047929489054976436535324281531320781800143010001000100011330.690.350.274.72e-030.020.04

Duplicate rows

Most frequent

LIMIT_BALSEXEDUCATIONMARRIAGEAGEPAY_0PAY_2PAY_3PAY_4PAY_5PAY_6BILL_AMT1BILL_AMT2BILL_AMT3BILL_AMT4BILL_AMT5BILL_AMT6PAY_AMT1PAY_AMT2PAY_AMT3PAY_AMT4PAY_AMT5PAY_AMT6DefaultSE_MAAgeBinSE_AGCloseness_6Closeness_5Closeness_4Closeness_3Closeness_2Closeness_1count
0200001222422444416501650165016501650165000000012110.920.920.920.920.920.922
150000122261-2-2-2-2-200000000000002111.001.001.001.001.001.002
250000212231-2-2-2-2-200000000000005161.001.001.001.001.001.002
38000022131-2-2-2-2-2-200000000000004271.001.001.001.001.001.002
48000022225-2-2-2-2-2-200000000000005161.001.001.001.001.001.002
58000023142-2-2-2-2-2-200000000000004381.001.001.001.001.001.002
690000212311-2-2-2-2-200000000000005271.001.001.001.001.001.002
7100000221491-2-2-2-2-200000000000004381.001.001.001.001.001.002
8110000212311-2-2-2-2-200000000000005271.001.001.001.001.001.002
9140000112291-2-2-2-2-200000000000002111.001.001.001.001.001.002